Building a robust linear model with forward selection and stepwise procedures

نویسندگان

  • Jafar A. Khan
  • Stefan Van Aelst
  • Ruben H. Zamar
چکیده

Classical step-by-step algorithms, such as forward selection (FS) and stepwise (SW) methods, are computationally suitable, but yield poor results when the data contain outliers and other contaminations. Robust model selection procedures, on the other hand, are not computationally efficient or scalable to large dimensions, because they require the fitting of a large number of submodels. Robust and computationally efficient versions of FS and SW are proposed. Since FS and SW can be expressed in terms of sample correlations, simple robustifications are obtained by replacing these correlations by their robust counterparts. A pairwise approach is used to construct the robust correlation matrix – not only because of its computational advantages over the d-dimensional approach, but also because the pairwise approach is more consistent with the idea of step-by-step algorithms. The proposed robust methods have much better performance compared to standard FS and SW. Also, they are computationally very suitable and scalable to large high-dimensional datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Model Selection in Linear Mixed Effects Models Using SAS PROC MIXED

Although there are disadvantages associated with model building procedures such as backward, forward and stepwise procedures (e.g. multiple testing, arbitrary significance level used in dropping or acquiring variables), many analysts use these procedures and are not aware that alternative modeling selection methods exist. This paper focuses on model selection using the Akaike Information Criter...

متن کامل

Primal and dual robust counterparts of uncertain linear programs: an application to portfolio selection

This paper proposes a family of robust counterpart for uncertain linear programs (LP) which is obtained for a general definition of the uncertainty region. The relationship between uncertainty sets using norm bod-ies and their corresponding robust counterparts defined by dual norms is presented. Those properties lead us to characterize primal and dual robust counterparts. The researchers show t...

متن کامل

Design of An Integrated Robust Optimization Model for Closed-Loop Supply Chain and supplier and remanufacturing subcontractor selection

The development of optimization and mathematical models for closed loop supply chain (CLSC) design has attracted considerable interest over the past decades. However, the uncertainties that are inherent in the network design are challenging the capabilities of the developed tools. In CLSC Uncertainty in demand is major source of uncertainty. The aim of this paper, therefore, is to present a Rob...

متن کامل

A robust multi-objective global supplier selection model under currency fluctuation and price discount

Robust supplier selection problem, in a scenario-based approach has been proposed, when the demand and exchange rates are subject to uncertainties. First, a deterministic multi-objective mixed integer linear programming is developed; then, the robust counterpart of the proposed mixed integer linear programming is presented using the recent extension in robust optimization theory. We discuss dec...

متن کامل

Forward Selection Procedure for Linear Model Building Using Spearman’s Rank Correlation

Forward selection (FS) is a step-by-step model-building algorithm for linear regression. The FS algorithm was expressed in terms of sample correlations where Pearson’s product-moment correlation was used. The FS yields poor results when the data contain contaminations. In this article, we propose the use of Spearman’s rank correlation in FS. The proposed method is called FSr. We conduct an exte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computational Statistics & Data Analysis

دوره 52  شماره 

صفحات  -

تاریخ انتشار 2007